Security Level Classification of Confidential Documents Written in Turkish

نویسندگان

  • Erdem Alparslan
  • Hayretdin Bahsi
چکیده

This article introduces a security level classification methodology of confidential documents written in Turkish language. Internal documents of TUBITAK UEKAE, holding various security levels (unclassified-restrictedsecret) were classified within a methodology using Support Vector Machines (SVM’s) [1] and naïve bayes classifiers [3][9]. To represent term-document relations a recommended metric “TF-IDF" [2] was chosen to construct a weight matrix. Turkic languages provide a very difficult natural language processing problem in comparison with English: “Stemming”. A Turkish stemming tool "zemberek" was used to find out the features without suffix. At the end of the article some experimental results and success metrics are projected.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Security-level classification for confidential documents by using adaptive neuro-fuzzy inference systems

.................................................................................................................................. ii ÖZET ............................................................................................................................................ iv TABLE OF CONTENTS ...................................................................................................

متن کامل

Symmetric Key Size for Different Level of Information Classification

Information is an important asset to an organization as well as to a nation. Incorrect handling of information may cause economic damage to an organization or cause harm to national security. Some of the information is confidential or sensitive. Confidential information can be categorized into various levels of classification. The classification depends on the level of damage to an organization...

متن کامل

Text Classification for Data Loss Prevention

Businesses, governments, and individuals leak confidential information, both accidentally and maliciously, at tremendous cost in money, privacy, national security, and reputation. Several security software vendors now offer “data loss prevention” (DLP) solutions that use simple algorithms, such as keyword lists and hashing, which are too coarse to capture the features what makes sensitive docum...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009